National Repository of Grey Literature 8 records found  Search took 0.01 seconds. 
Extraction of multilingual valency frames from dependency treebanks
Faryad, Ján ; Zeman, Daniel (advisor) ; Lopatková, Markéta (referee)
Multilingual valency dictionaries provide helpful information about correspon- dence of valency frames (verbs and their arguments) across various languages. This work aims at developing a program that automatically creates a multi- lingual valency dictionary, based on parallel treebanks annotated according to Universal Dependencies. This task includes monolingual extraction of va- lency frames and their cross-lingual linking. Various methods for solving the task are analysed and implemented. The work includes both general, language- independent approach and additional language-specific extensions, provided in particular for English, Czech and Slovak. The methods for linking the valency frames include using word alignment, morphological and syntactic information contained in the UD annotation or similarity of verbs between related languages. The quality of the solution is evaluated by multiple established metrics on man- ually annotated data or by comparison with an existing valency dictionary. 1
Robust Parsing of Noisy Content
Daiber, Joachim ; Zeman, Daniel (advisor) ; Mareček, David (referee)
While parsing performance on in-domain text has developed steadily in recent years, out-of-domain text and grammatically noisy text remain an obstacle and often lead to significant decreases in parsing accuracy. In this thesis, we focus on the parsing of noisy content, such as user-generated content in services like Twitter. We investigate the question whether a preprocessing step based on machine translation techniques and unsupervised models for text-normalization can improve parsing performance on noisy data. Existing data sets are evaluated and a new data set for dependency parsing of grammatically noisy Twitter data is introduced. We show that text-normalization together with a combination of domain-specific and generic part-of-speech taggers can lead to a significant improvement in parsing accuracy. Powered by TCPDF (www.tcpdf.org)
Form and function of nouns in Czech: relation between nominal case and syntactic function. Based on a synchronic written corpus of Czech (SYN2005)
Jelínek, Tomáš ; Petkevič, Vladimír (advisor) ; Lopatková, Markéta (referee) ; Uličný, Oldřich (referee)
The case in Czech is the basic morphological means by which nouns express their function in a sentence. The objective of this thesis is to describe, from a frequency point of view, the relation between form and function of nouns, or, more precisely, how frequently cases (both simple and prepositional) are used to realise syntactic functions in sentences. The thesis is based on one of the largest corpora of written synchronic Czech: 100-million-token corpus SYN2005. In order to obtain data on frequencies of syntactic functions of nouns in relation to their cases, we annotated the corpus SYN2005 with a dependency syntactic annotation. For this annotation, we adopted the format of the analytical layer of the Prague Dependency Treebank. The syntactic annotation has been performed by a stochastic parser: the MST parser. Since the reliability of this annotation was not high enough, we have built an automatic correction module, which identifies errors of syntactic annotation in the output of the stochastic parser and corrects these errors by means of linguistic rules. We have implemented 26 different rules, but annotation errors have been reduced by merely 6-8%. However, this correction module can be further developed. It can be used to correct the output of any dependency parser trained on the data from...
Syntax in methods for information retrieval
Straková, Jana
Title: Information Retrieval Using Syntax Information Author: Bc. Jana Kravalová Department: Institute of Formal and Applied Linguistics Supervisor: Mgr. Pavel Pecina, Ph.D. Supervisor's e-mail address: pecina@ufal.mff.cuni.cz Abstract: In the last years, application of language modeling in infor- mation retrieval has been studied quite extensively. Although language models of any type can be used with this approach, only traditional n-gram models based on surface word order have been employed and described in published experiments (often only unigram language models). The goal of this thesis is to design, implement, and evaluate (on Czech data) a method which would extend a language model with syntactic information, automatically obtained from documents and queries. We attempt to incorporate syntactic information into language models and experimentally compare this approach with uni- gram and bigram model based on surface word order. We also empirically compare methods for smoothing, stemming and lemmatization, effectiveness of using stopwords and pseudo relevance feedback. We perform a detailed ana- lysis of these retrieval methods and describe their performance in detail. Keywords: information retrieval, language modelling, depenency syntax, smo- othing
Tvorba závislostního korpusu pro jorubštinu s využitím paralelních dat
Oluokun, Adedayo ; Zeman, Daniel (advisor) ; Rosa, Rudolf (referee)
The goal of this thesis is to create a dependency treebank for Yorùbá, a language with very little pre-existing machine-readable resources. The treebank follows the Universal Dependencies (UD) annotation standard, certain language-specific guidelines for Yorùbá were specified. Known techniques for porting resources from resource-rich languages were tested, in particular projection of annotation across parallel bilingual data. Manual annotation is not the main focus of this thesis; nevertheless, a small portion of the data was verified manually in order to evaluate the annotation quality. Also, a model was trained on the manual annotation using UDPipe.
Robust Parsing of Noisy Content
Daiber, Joachim ; Zeman, Daniel (advisor) ; Mareček, David (referee)
While parsing performance on in-domain text has developed steadily in recent years, out-of-domain text and grammatically noisy text remain an obstacle and often lead to significant decreases in parsing accuracy. In this thesis, we focus on the parsing of noisy content, such as user-generated content in services like Twitter. We investigate the question whether a preprocessing step based on machine translation techniques and unsupervised models for text-normalization can improve parsing performance on noisy data. Existing data sets are evaluated and a new data set for dependency parsing of grammatically noisy Twitter data is introduced. We show that text-normalization together with a combination of domain-specific and generic part-of-speech taggers can lead to a significant improvement in parsing accuracy. Powered by TCPDF (www.tcpdf.org)
Syntax in methods for information retrieval
Straková, Jana
Title: Information Retrieval Using Syntax Information Author: Bc. Jana Kravalová Department: Institute of Formal and Applied Linguistics Supervisor: Mgr. Pavel Pecina, Ph.D. Supervisor's e-mail address: pecina@ufal.mff.cuni.cz Abstract: In the last years, application of language modeling in infor- mation retrieval has been studied quite extensively. Although language models of any type can be used with this approach, only traditional n-gram models based on surface word order have been employed and described in published experiments (often only unigram language models). The goal of this thesis is to design, implement, and evaluate (on Czech data) a method which would extend a language model with syntactic information, automatically obtained from documents and queries. We attempt to incorporate syntactic information into language models and experimentally compare this approach with uni- gram and bigram model based on surface word order. We also empirically compare methods for smoothing, stemming and lemmatization, effectiveness of using stopwords and pseudo relevance feedback. We perform a detailed ana- lysis of these retrieval methods and describe their performance in detail. Keywords: information retrieval, language modelling, depenency syntax, smo- othing
Form and function of nouns in Czech: relation between nominal case and syntactic function. Based on a synchronic written corpus of Czech (SYN2005)
Jelínek, Tomáš ; Petkevič, Vladimír (advisor) ; Lopatková, Markéta (referee) ; Uličný, Oldřich (referee)
The case in Czech is the basic morphological means by which nouns express their function in a sentence. The objective of this thesis is to describe, from a frequency point of view, the relation between form and function of nouns, or, more precisely, how frequently cases (both simple and prepositional) are used to realise syntactic functions in sentences. The thesis is based on one of the largest corpora of written synchronic Czech: 100-million-token corpus SYN2005. In order to obtain data on frequencies of syntactic functions of nouns in relation to their cases, we annotated the corpus SYN2005 with a dependency syntactic annotation. For this annotation, we adopted the format of the analytical layer of the Prague Dependency Treebank. The syntactic annotation has been performed by a stochastic parser: the MST parser. Since the reliability of this annotation was not high enough, we have built an automatic correction module, which identifies errors of syntactic annotation in the output of the stochastic parser and corrects these errors by means of linguistic rules. We have implemented 26 different rules, but annotation errors have been reduced by merely 6-8%. However, this correction module can be further developed. It can be used to correct the output of any dependency parser trained on the data from...

Interested in being notified about new results for this query?
Subscribe to the RSS feed.